People show up in comp.lang.perl.misc all the time asking how to use
the contents of a variable as the name of another variable. For
example, they have $foo = 'snonk', and then they want to operate on
the value of $snonk.
That's very easy to do in Perl, so they usually get some people to
tell them to do it. And they usually get some people asking them why
they didn't use a hash instead. Sometimes I'm one of the people who
says to use a hash instead, and sometimes I'm one of the people who
answers the question that was asked, even though I think they should
be using a hash instead.
Anyway, a couple of weeks ago one of my clients called up with some
program that was producing wrong reports. They needed it to be fixed
by the following day. The program was going to read a database with
records like these:
this red something
that green something else
other red more
this blue still more
and build a report of how many records had each value in each
position.
It turned out that the clods who had written this program had done
something like this:
while (<RECORDS>) {
chomp;
@values = split /\t/, $_;
foreach $v (@values) {
$$v++;
}
}
print <<EOM;
Question 1:
$this users said `this'. $that users said `that'.
Question 2:
$red users said their favorite color was red.
... (and so on ) ...
EOM
Of course, the actual code was much longer and much more obfuscated.
Anyway, to make a long story short, the problem turned out to be that
there was a certain response, let's say `foo', (actually, it was
`digoxin'---go figure) which was a valid response for two totally
unrelated questions, say #7a and #11. So anyone answering `foo' to
question 7a would be counted as having answered `foo' to question 11
as well, and vice versa. At the end of the analysis, the $foo
variable contained the *sum* of all the users who answered `foo' to
either question 7a or to question 11. Then the reports used this sum
in two places, and that's why the reports were inaccurate.
This shoddy logic was so pervasive in the program that I couldn't find
an easy way to fix it. If the original programmers had used a series
of hashes instead of stuffing everything into a bunch of global
variables, it would never have happened, or at worst it would have
been easy to fix. I ended up doing a major overhaul on the program to
solve the problem. The main loop turned into something more like:
while (<RECORDS>) {
chomp;
@values = split /\t/, $_;
for ($i=1; $i <= $NUMQUESTIONS; $i++)
my $v = shift @values;
$count[$i]{$v}++
}
}
Of course, the actual code was much longer and much more obfuscated,
although it was nither as long nor as obfuscated as when I got to work
on it.
I shudder to think what would have happened to this program if one of
the responses had been named `i' or `v' or `3' or some such. One can
even imagine that that happened once upon a time, and the reponse was
to change the name instead of taking the warning.
Anyway, deriving the name of the variable from an input value turned
out to be a very stupid decision in this case, and one which cost my
client a couple of thousand dollars.
When people come into comp.lang.perl.misc asking how to do something
stupid, I'm never quite sure what to do. I can just answer the
question as asked, figuring that it's not my problem to tell people
that they're being stupid. That's in my self-interest, because it
takes less time to answer the question that way, and because someone
might someday pay me to clean up after their stupidity, as happened in
this instance. But if I do that, people might jump on me for being a
smart aleck, which has happened at times. (``Come on, help the poor
guy out; if you know what he really need why don't you just give it to
him?'')
On the other hand, I could try to answer on a different level, present
a better solution, and maybe slap a little education on `em. That's
nice when it works, but if it doesn't it's really sad to see your hard
work and good advice ignored. Also, people tend to jump on you for
not answering the question. (``Who are you to be telling this guy
what he should be doing? Just answer the question.'')
I guess there's room for both kinds of answer. Or maybe there isn't
room for either kind.
Whatever. I seem to have gone off on a tangent. The real root of the
problem code is: It's fragile. You're mingling unlike things when you
do this. And of two of those unlike things happen to have the same
name, they'll collide and you'll get the wrong answer. So you end up
having a whole long list of names which you have to be careful not to
reuse, and if you screw up, you get a very bizarre error. This is
precisely the problem that namespaces were invented to solve, and
that's just what a hash is: A portable namespace.
The main point of this article was to present a real example of a case
where using a variable as a variable name was a really stupid thing to
do. Since most of the people who post about that in
comp.lang.perl.misc seem to be trying to do the same stupid thing in
the same stupid way, I thought I'd mention it, and maybe raise the
general awareness of this problem.